AITopics | differential description

Collaborating Authors

differential description

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Think Twice Before Recognizing: Large Multimodal Models for General Fine-grained Traffic Sign Recognition

Gan, Yaozong, Li, Guang, Togo, Ren, Maeda, Keisuke, Ogawa, Takahiro, Haseyama, Miki

arXiv.org Artificial IntelligenceSep-2-2024

We propose a new strategy called think twice before recognizing to improve fine-grained traffic sign recognition (TSR). Fine-grained TSR in the wild is difficult due to the complex road conditions, and existing approaches particularly struggle with cross-country TSR when data is lacking. Our strategy achieves effective fine-grained TSR by stimulating the multiple-thinking capability of large multimodal models (LMM). We introduce context, characteristic, and differential descriptions to design multiple thinking processes for the LMM. The context descriptions with center coordinate prompt optimization help the LMM to locate the target traffic sign in the original road images containing multiple traffic signs and filter irrelevant answers through the proposed prior traffic sign hypothesis. The characteristic description is based on few-shot in-context learning of template traffic signs, which decreases the cross-domain difference and enhances the fine-grained recognition capability of the LMM. The differential descriptions of similar traffic signs optimize the multimodal thinking capability of the LMM. The proposed method is independent of training data and requires only simple and uniform instructions. We conducted extensive experiments on three benchmark datasets and two real-world datasets from different countries, and the proposed method achieves state-of-the-art TSR results on all five datasets.

lmm, recognition, traffic sign, (14 more...)

arXiv.org Artificial Intelligence

2409.01534

Country:

Asia > Japan > Hokkaidō > Hokkaidō Prefecture > Sapporo (0.05)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.05)
Africa > Togo (0.05)
(4 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.67)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Follow-Up Differential Descriptions: Language Models Resolve Ambiguities for Image Classification

Esfandiarpoor, Reza, Bach, Stephen H.

arXiv.org Artificial IntelligenceMar-15-2024

A promising approach for improving the performance of vision-language models like CLIP for image classification is to extend the class descriptions (i.e., prompts) with related attributes, e.g., using brown sparrow instead of sparrow. However, current zero-shot methods select a subset of attributes regardless of commonalities between the target classes, potentially providing no useful information that would have helped to distinguish between them. For instance, they may use color instead of bill shape to distinguish between sparrows and wrens, which are both brown. We propose Follow-up Differential Descriptions (FuDD), a zero-shot approach that tailors the class descriptions to each dataset and leads to additional attributes that better differentiate the target classes. FuDD first identifies the ambiguous classes for each image, and then uses a Large Language Model (LLM) to generate new class descriptions that differentiate between them. The new class descriptions resolve the initial ambiguity and help predict the correct label. In our experiments, FuDD consistently outperforms generic description ensembles and naive LLM-generated descriptions on 12 datasets. We show that differential descriptions are an effective tool to resolve class ambiguities, which otherwise significantly degrade the performance. We also show that high quality natural language class descriptions produced by FuDD result in comparable performance to few-shot adaptation methods. What is the most distinguishing characteristic of a sparrow? To distinguish it from what?

ambiguous class, class description, differential description, (14 more...)

arXiv.org Artificial Intelligence

2311.07593

Country:

North America > United States > Tennessee (0.06)
North America > Mexico > Veracruz (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > Colorado > El Paso County > Colorado Springs (0.04)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.48)

Industry:

Transportation > Passenger (0.68)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback